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1. Introduction 


The so called Startup Act (Decree Law 179/2012, converted into Law 221/2012), has in- 
troduced in Italy the notion of innovative companies with a high technological value, i.e., the 
innovative start-ups. Among them, the Italian government includes the category of social start- 
ups, i.e., “startup innovative a vocazione sociale” (hereafter SIAVS), representing a relatively 
new field of interest in both scientific and normative perspective. 

SIAVS must satisfy the same requirements of other innovative startups, but operate in sectors 
such as social assistance, education, health, social tourism and culture, enjoying also some tax 
benefits. Furthermore, they have a possible direct (social) impact on the collective well-being, 
measured through a self-evaluation document named: “Documento di Descrizione dell’ Impatto 
Sociale” published yearly by each SIAVS (Vesperi, Lenzo). Today, social startups are more than 
doubled with respect to five years ago!. 

Within Italian academic debate concerning startups and innovative economic enterprises, 
SIAVS have been considered for their hybrid nature, balancing between profit and non-profit 
model of business, and for their role of producing value for local communities (Vesperi et al, 
2015). Although there are some recent empirical studies on social entrepreneurship intentions 
(Bacq et al., 2016), little is known about territorial pattern of SIAVS, even if a certain similarity 
has been observed, at regional scale, with the territorial distribution of overall startups (Maglio, 
2019). Italian non-profit organizations present different characteristics compared to innovative 
companies, notably on gender balance in workforce and territorial diffusion (Istat, 2019; Forum 
del Terzo Settore, 2017). 

The aim of this paper is to investigate the relevant factors influencing the presence of social 
startups in Italy at the provincial level. The outcome variable is the number of active social 
startups in Italian provinces while the set of explanatory variables is composed by economic 
and demographic indicators at the provincial level. 

Regarding the explanatory variables, unemployment rate and number of incubators have 
been used as predictors of the number of startups at regional level in Colombelli (Quartaro), 
while Hoogerndoorn (2016) considers the GDP per capita. Information regarding registered 
firms at the provincial level can be found also in the work of Colombelli et al. (2019) to predict 
the number of new firms at the provincial level (NUTS 3 regions). Furthermore, the effec- 
tiveness of incubators for Italian startups is still under debate (Deidda Gagliardo et al., 2017), 
while Sansone et al. (2020) have introduced a new taxonomy, distinguishing between business, 
mixed and social incubators. We also consider other variables as broadband, which can be 
viewed as a proxy of the technological level of a province, and the percentage of NEET (nei- 
ther in employment or in education or training between 15 and 29 years) which is a measure of 
non-attractiveness of a territory for the young people. 

Generalized linear models (GLM) for discrete outcomes are applied and compared, even 
taking into account the zero-inflated issue arising due to the distribution of these particular data. 
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2. Materials and Methods 


Data 


Information regarding startups and certified incubators are retrieved from the Italian Cham- 
bers of Commerce’, updated to the third quarter 2020. Other additional variables, at the provin- 
cial (NUTS 3) and regional (NUTS 2) level, and the spatial coordinates of these provinces, are 
obtained through the Italian National Institute of Statistics? (ISTAT) and European Statistical 
Office * (EUROSTAT). 

A possible drawback is that some variables suffer from timeliness issue. Moreover, for the 
purpose of this explorative study, this issue seems less severe considering the reasonably not 
too high variations occurring in the short term period at provincial level. Thus, we retrieved the 
latest update (i.e., the value for the last available year) for all considered covariates. In some 
cases we consider the geometric mean to avoid problems related to possible temporal variations. 


Measurement Variables 


The dependent variable is the count of SIAVS in Italian provinces. Therefore, the sample 
size is equal to n = 105, composed by all Italian provinces except for “Sud Sardegna” and 
“Andria-Trani-Barletta”, which do not include any kind of startup in their territory. 

As mentioned, we identified the following candidates as possible determinants for the pres- 
ence of SIAVS (the latest update is in brackets): 


Population density, number of inhabitants divided by the area of province (reference year: 
2017). A logarithm transformation is applied; 

GDP per capita, in thousand of euros (reference year: 2017); 

Incubators, count of certified incubators in each province. These particular companies (Decree 
Law 179/2012) are registered in the Italian Chambers of Commerce and offer services for 
the developing of startups (reference year: 2020); 

Unemployment rate, at the provincial level (reference year: 2019); 

Registration rate, companies registered in a year divided by the total number of companies 
registered in the previous year in the Italian “registro imprese” at the provincial level 
(geometric mean between 2015 and 2018); 

Broadband, number of ultra-broadband subscriptions as a percentage of inhabitants in each 
provinces (geometric mean between 2015 and 2017); 

Social employees, rate of workers in social cooperatives (reference year: 2019) 

NEETs, percentage of neither in employment or in education or training pepole between 15 
and 29 years (reference year: 2019). 


Statistical Models 


The number of SIAVS in Italian provinces can be modelled applying GLM family (see e.g. 
Nelder, Wedderburn; McCullagh, Nelder, among others). The general formulation of GLM 
(Agresti, 2003) is carried out through a link function g(-), which transforms the expectation of 
the response variable, i.e. p; = E(Y;), to the linear predictor: 


gli) = Bo + Bita +: + Bot,  i=l...n (1) 


where p = 8 is the number of variables previously discussed. 


*https://www.registroimprese.it/ 
3nttps://www.istat.it/ 
‘https://ec.europa.eu/eurostat 
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In this context, two main competing models can be considered: Poisson (POD and Negative 
Binomial (NB) regression. In the former case, Y; ~ Poi(A;) and the corresponding log-link 
function is g(4;) = log A;, while in the latter case Y; ~ Neg Bin(;,w). In the POI model, the 
observed counts are equidispersed, i.e. E(Y;) = Var(Y;) = mi. Moreover, the scale parameter 
w in NB model takes into account for the presence of overdispersion i.e. Var(Y;) = pi + p/w. 

A possible issue related to the count of SIAVS (and startups in general) is the possible 
presence of excess of zeros in the data, i.e. provinces without any registered SIAVS. Thus, 
previously introduced models may be modified to take into account the zero inflation. The zero 
inflated Poisson (ZIP) model is derived as a mixture of a binary logistic and POI (Lambert, 
1992). The responses Y; are independent and Y; ~ 0 with probability 7; and Y; ~ Poi(A;) with 
probability 1 — 7;. The resulting link function can be written as follows: 


log, ify =0 
i) = Di 2 
IH) Li ify;> 0 2 


The zero inflated NB (ZINB) model, introduced in Greene (1994), is derived by substituting 
the POI link function with the NB (when responses are not equal to zero). We remark that ZIP 
and ZINB assume that the zero inflation effect is generated by a separate process apart from the 
count values. 


3. Results and Discussion 


The number of SIAVS is equal to 240 and the 87.5% of them are classified in the service 
sector. The remaining 12.5% is divided in industry and/or craft sector (7.9%) and sectors such as 
agriculture, tourism and commerce (4.6%). Registered SIAVS present almost 40 activity codes. 
The main activities of SIAVS can be divided in: a) software production and IT consultancy 
(17.5%), b) scientific research and development (12.9%), c) education (10.4%), d) information 
and other services (9.6%), e) non-residential social assistance (8.8%), f) activities related to 
libraries, archives and museums (3.8%) g) art and entertainment (2.9%). The remaining 34.2% 
of SIAVS are classified in 33 different activity codes. 

Almost a quarter of SIAVS (24.2%) is located in the province of Milan (58), while provinces 
of Rome and Turin include respectively 27 (11.2%) and 13 (5.4%) SIAVS. In general, 65 
provinces (62%) contain almost a SIAVS but only 20 (19%) of them registered more than 2 
social startups. SIAVS also present a higher frequency of female prevalence (measured in terms 
of at least 50% of women in the company) compared to other startups, exceeding them by 
the 10%. Moreover, differences can not be found in practice (with respect to other startups) 
regarding the proportion of young people (under 35) and foreigners. 

In Figure 1 we can observe the distribution of SIAVS (left panel), the distribution of startups 
(center panel) at the provincial level and the distribution of non-profit institutions at the regional 
level (right panel). Main differences between startups and SIAVS can be viewed in the provinces 
of Centre Italy. Nonetheless, startups and SIAVS are concentrated in the metropolitan areas 
(especially in the provinces of Milan and Rome) and also the non-profit subjects can be found 
especially in the North-East (Lombardia Region). In addition, the provinces of Sardinia present 
the lower counts of startups and SIAVS, even if the number of non-profit institutions appears 
comparable with respect to the other regions. 

Table 1 summarizes the main results of statistical models discussed in Section 2. First of all, 
we check the usefulness of the whole set of regressors in all models, by observing the decreasing 
of the Bayesian Information Criterion (BIC) between the null models (BIC ), including only the 
intercept, and the models with all considered covariates. For each model, the BIC is function 
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of a different likelihood, and the decreasing is more (numerically) evident in the POI and ZIP 
models than in the NB and ZINB. Another similar check can be also carried out (only for the first 
two models) through the McFadden’s Pseudo R?. We also report, for each model specification, 
the likelihood ratio test statistic to formally test for the departure from the “null” model (which 
only includes the intercept) and its associated p-value. This check also confirms the usefulness 
of proposed regressors. We have to remark that it is not possible to make a proper comparison 
between the four models in terms of likelihood-based statistics. Therefore, we use a leave- 
one-out cross-validation (CV) approach to compare the prediction of four models, estimating 
R = 105 times the model and then computing MSE(CV) = n~! 5>..(g, — yr)”. Regarding this 
performance indicator, the conventional POI exhibits the lower MSE(CV), followed by the ZIP 
and ZINB. Finally, the (here not reported) results of two Vuong tests (Vuong, 1989) suggest the 
rejection of null hypothesis of POI and NB in favour of ZIP and ZINB. 


Startups SIAVS Non-profit Sector 


Figure 1: Geographical distribution of number of startups, number of SIAVS (provincial level) 
and non-profit sector (regional level). 


Conventional GLM models help to identify log population density, (certified) incubators and 
broadband as positive determinants of the counts of SIAVS at the provincial level considering 
a nominal error rate of the 1%. Conversely, in more robust zero-inflated regressions, the coef- 
ficient of population density is no longer statistically significant. Therefore, in ZIP and ZINB, 
unemployment rate is identified as a possible positive driver for the arise of SIAVS, while the 
percentage of young people neither in employment or in education or training can be consid- 
ered as a negative indicator for the arise of SIAVS. Surprisingly, GDP per capita and social 
employees are not statistically significant in any considered model. 

Certified incubators appears fundamental for the presence of SIAVS. At a descriptive level, 
64% of SIAVS (153) is located in provinces including almost a certified incubator. This per- 
centage is slightly lower considering all innovative startups (56%). 

To conclude, SIAVS arise in provinces with higher technological levels, including ecosys- 
tem to develop and assist startups. Basing on our results, also population density and unem- 
ployment may have an influence on the presence of SIAVS, but further investigation will be 
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Table 1: Input variables setting scheme used in each model. 


POI NB ZIP ZINB 
Intercept 0.7569 -0.3718 0.7480 0.7441 
log(Population density) 0.3491 *** 0.3949 ** 0.1636 0.1637 
Incubators 0.2373 ** 0.2326 * 0.2269 ** 0.2269 ** 
Registration rate -0.2995 . -0.2228 -0.1976 -0.1972 
Unemployment rate 0.0468 0.0409 0.1512 *** 0.1512 *** 
Broadband 0.1969 ** 0.1810 ** 0.1987 *** 0.1987 *** 
GDP per capita -0.0304 -0.0184 -0.0146 -0.0146 
Social employees -0.0746 -0.0591 -0.0500 -0.0499 
NEETs -0.0312 -0.0247 -0.0764 ** -0.0764 ** 
BIC, 803.8809 415.8049 714.4403 420.4589 
BIC 359.6001 362.5449 357.1889 361.8049 
McFadden’s R? 0.6025 0.2226 - - 
LR Test 481.5100 ** 90.4920 *** 431.7100 *** 133.1200 *** 
MSE(CV) 6.1835 25.5918 10.7048 10.7284 


Significance codes: 0 < ‘xxx?’ < 0.001 < ‘xx’ < 0.01 < ‘*°< 0.05 < ©. <0.1<% ’<1 


conducted at the territorial level. 
Future interesting analysis will concern the trend of new SIAVS in time (using quarterly 
data), even considering autoregressive models for integer data (see e.g Palazzo, 2019). 
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